AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Image-text matching

# Image-text matching

Cultureclip
Vision-language model fine-tuned based on CLIP-ViT-B/32, suitable for image-text matching tasks
Text-to-Image Transformers
C
lukahh
20
0
Mexma Siglip2
MIT
MEXMA-SigLIP2 is a high-performance CLIP model combining the MEXMA multilingual text encoder and SigLIP2 image encoder, supporting 80 languages.
Text-to-Image Supports Multiple Languages
M
visheratin
224
4
Clip Vit Tiny Random Patch14 336
This is a small CLIP model for debugging purposes, based on the ViT architecture with randomly initialized weights.
Text-to-Image Transformers
C
yujiepan
14.47k
0
Longclip GmP ViT L 14
A CLIP model fine-tuned based on BeichenZhang/LongCLIP-L, supporting long-text input (248 tokens) with performance enhanced by Geometric parameterization (GmP) technology
Text-to-Image Transformers
L
zer0int
4,859
61
Resnet101 Clip.openai
MIT
A CLIP model based on ResNet101 architecture, supporting zero-shot image classification tasks.
Image Classification
R
timm
2,717
0
Resnet50 Clip.openai
MIT
Zero-shot image classification model based on ResNet50 architecture and CLIP technology
Image Classification
R
timm
11.91k
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase